Efficient Algorithms in Analyzing Genomic Data

نویسندگان

  • Feng Pan
  • Wei Wang
چکیده

Feng Pan: Efficient Algorithms in Analyzing Genomic Data. (Under the direction of Wei Wang.) With the development of high-throughput and low-cost genotyping technologies, immense data can be cheaply and efficiently produced for various genetic studies. A typical dataset may contain hundreds of samples with millions of genotypes/haplotypes. In order to prevent data analysis from becoming a bottleneck, there is an evident need for fast and efficient analysis methods. My thesis focuses on two interesting and important genetic analyzing problems. • Genome-wide Association mapping. The goal of genome wide association mapping is to identify genes or narrow regions in the genome which have significant statistical correlations to the given phenotypes. The discovery of these genes offers the potential for increased understanding of biological processes affecting phenotypes such as body weight and blood pressure. • Sample selection for maximal Genetic Diversity. Given a large set of samples, it is usually more efficient to first conduct experiments on a small subset. Then the following question arises: What subset to use? There are many experimental scenarios where the ultimate objective is to maintain, or at least maximize, the genetic diversity within relatively small breeding populations. In my thesis, I developed the following efficient and effective algorithms to address these problems. • Phylogeny-based Genom-wide association mapping: – TreeQA: The algorithm uses local perfect phylogeny tree in genome wide analysis for genotype/phenotype association mapping. Samples are partitioned according to the sub-trees they belong to. The association between a tree and the phenotype is measured by some statistic tests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Tools and Algorithms for Privacy Protection and Data Security in Social Networks

The purpose of this research, is to study factors influencing privacy concerns about data security and protection on social network sites and its’ influence on self-disclosure. 100 articles about privacy protection, data security, information disclosure and Information leakage on social networks were studied. Models and algorithms types and their repetition in articles have been distinguished a...

متن کامل

Improving the Performance of ICA Algorithm for fMRI Simulated Data Analysis Using Temporal and Spatial Filters in the Preprocessing Phase

Introduction: The accuracy of analyzing Functional MRI (fMRI) data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI data having high noise is to use suitable preprocessing methods with the aim of data denoising. Some effects of preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evalua...

متن کامل

Efficient Approximation Algorithms for Point-set Diameter in Higher Dimensions

We study the problem of computing the diameter of a  set of $n$ points in $d$-dimensional Euclidean space for a fixed dimension $d$, and propose a new $(1+varepsilon)$-approximation algorithm with $O(n+ 1/varepsilon^{d-1})$ time and $O(n)$ space, where $0 < varepsilonleqslant 1$. We also show that the proposed algorithm can be modified to a $(1+O(varepsilon))$-approximation algorithm with $O(n+...

متن کامل

AFRL-AFOSR-JP-TR-2017-0039 Learning in the context of distribution drift

The increasing ubiquity of data and its ever-increasing use to deliver tangible value raises the need for ever more effective technologies for data analysis. Many online data sources are subject to distribution drift: the frequency of different factors and the relationships between them change over time. This is problematic for machine learning because almost all algorithms assume that distribu...

متن کامل

Efficient Data Mining with Evolutionary Algorithms for Cloud Computing Application

With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009